Construction of a high-density genetic map and identification of QTLs related to agronomic and physiological traits in an interspecific (Gossypium hirsutum × Gossypium barbadense) F2 population

Background Advances in genome sequencing technology, particularly restriction-site associated DNA sequence (RAD-seq) and whole-genome resequencing, have greatly aided the construction of cotton interspecific genetic maps based on single nucleotide polymorphism (SNPs), Indels, and other types of markers. High-density genetic maps can improve accuracy of quantitative trait locus (QTL) mapping, narrow down location intervals, and facilitate identification of the candidate genes. Result In this study, 249 individuals from an interspecific F2 population (TM-1 and Hai7124) were re-sequenced, yielding 6303 high-confidence bin markers spanning 5057.13 cM across 26 cotton chromosomes. A total of 3380 recombination hot regions RHRs were identified which unevenly distributed on the 26 chromosomes. Based on this map, 112 QTLs relating to agronomic and physiological traits from seedling to boll opening stage were identified, including 15 loci associated with 14 traits that contained genes harboring nonsynonymous SNPs. We analyzed the sequence and expression of these ten candidate genes and discovered that GhRHD3 (GH_D10G0500) may affect fiber yield while GhGPAT6 (GH_D04G1426) may affect photosynthesis efficiency. Conclusion Our research illustrates the efficiency of constructing a genetic map using binmap and QTL mapping on the basis of a certain size of the early-generation population. High-density genetic map features high recombination exchanges in number and distribution. The QTLs and the candidate genes identified based on this high-density genetic map may provide important gene resources for the genetic improvement of cotton. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08528-2.

barbadense provides excellent fiber that is finer, longer and stronger than fiber of G. hirsutum [3,4]. Efficient and extensive transmission of valuable genes between G. barbadense and G. hirsutum is of extremely important practical significance for improving fiber quality while maintaining fiber yield, which is mainly limited by linkage drag.
Quantitative traits exhibit continuous variation and are generally controlled by multiple genes, hence having a complex genetic basis; moreover, they are readily affected by the environment. Genetic research on quantitative traits is therefore difficult, and investigating the inheritance and QTL mapping of cotton quantitative traits is of great significance to the advancement of cotton genetics and breeding. Since Shappley et al. [5] constructed the first genetic map of cotton, many studies have conducted QTL mapping for important cotton traits.
A high-density molecular genetic map is the foundation of plant genome research. Interspecific maps have been constructed for cotton, mainly between G. barbadense and G. hirsutum, and used to explore species differences such as in yield and quality traits . These studies have provided very useful information for cotton molecular design and breeding. There are many QTLenriched regions in the cotton genome, and there may be large numbers of related genes that play important roles in the plant's growth and development [28]. Notably, QTLs for important traits are unevenly distributed among 26 different chromosomes of cotton. In interspecific populations, fiber quality QTLs are more typically located in the A subgenome, while in intraspecific populations, fiber yield and quality QTLs are more frequent in the D subgenome [29,30]. Although DD diploid species do not have spinnable fibers, many studies have shown that the D subgenome of allotetraploid cotton contains many QTLs that control fiber quality [31,32]. However, while previous studies have revealed these and other useful findings, the different groups and markers employed combined with the impact of environmental factors on QTL effects mean that the comparability of extant data is relatively poor. Therefore, QTL research on cotton is still advancing.
Recent advances in genome sequencing technology allow the construction of ultra-high-density genetic maps based on SNP loci. Consequently, more comprehensive and accurate map information can be used to analyze QTLs associated with important traits. Since bin genetic linkage maps based on SNP loci were first constructed in rice [33], it has been widely applied in other plants such as cotton [20], maize [34], soybean [35], Cucumis melo [36], radish [37] and so on. These genetic linkage maps have yielded many fine-mapped QTLs for which corresponding target genes were identified and cloned.
Recent high-quality assemblies of G. barbadense and G. hirsutum [2,[38][39][40][41][42] has provided good references for linkage map-based QTL identification. In light of these resources, we constructed an interspecific F 2 population between G. hirsutum and G. barbadense and performed whole-genome sequencing of all 249 F 2 individuals, achieving resequencing data on average over 5 × genome coverage of each material and generating a high-density genetic map containing 6303 bin markers. Based on the map, we subsequently identified 112 QTLs associated with an array of traits including plant type traits and physiological traits at the seedling stage, leaf chlorophyll content, plant type traits at flower and boll stage, yield traits, and fiber quality traits. Combining the SNPs located within the predicted genes in the target region and their expression pattern of the predicted genes, possible causable genes that are responsible for the mapping traits were identified. These QTLs and the related candidate genes are valuable in cotton breeding to improve plant biomass, physiological characteristics, and yield quality.

Plant materials and DNA extraction
The plant materials used consisted of G. hirsutum acc. TM-1 supplied by Dr. Kohel of Southern Plain Agricultural Research Center, USDA [43] and G. barbadense cv. Hai7124 which was selected by Cotton Research Institute of Nanjing Agricultural University for genetic research [17]. TM-1 is a genetic standard line of G. hirsutum developed through single plant selection. Hai7124, grown extensively in China, was also the offspring of a single plant selection before being used as a parent in the construction of the linkage map. Two highly homozygous parents, as well as 249 F 2 individuals derived from a cross between TM-1 as the recipient and Hai7124 as the donor were planted in Pailou greenhouse of Nanjing Agricultural University, Jiangsu, China. Genomic DNA was extracted from young leaf tissues following the method cetyltriethylammnonium bromide (CTAB) described by Paterson [44] with increased RNase A and proteinase K treatment to prevent RNA and protein contamination. The isolated DNA was then subjected to Illumina sequencing technology.
To obtain the phenotypic data of two parents, F 1 , and all 249 F 2 individual plants at different environments. All of them were cut off the trunk, transferred in the large nutrient bowls, and moved into the greenhouse in autumn. In the next spring, these materials were planted in the field for investigation of yield and fiber traits. The same operation was repeated twice in 2011 and 2012.

Plant type traits at seedling stage
The following plant type traits of the parents, F 1 , and F 2 individual plants respectively were investigated at the cotton seedling stage: plant height (PH1, cm); cotyledonary node height (CNH, cm); first true leaf height (FTLH, cm); second true leaf height (STLH, cm); distance between the cotyledonary node and first true leaf (D1, cm); and distance between first true leaf and second true leaf (D2, cm). Each measurement was repeated three times and the average value was used in the analysis.

Physiological traits at seedling stage
Physiological characteristics such as leaf area and photosynthetic rate were measured in the parents, F 1 , and F 2 individual plants at the cotton seedling stage. A portable leaf area meter (CI-202, Portable Laser Leaf Area Meter, USA) was used to measure the second true leaf area (SLA, cm 2 ). At the same time, from 8:00 to 11:00 in the morning on a sunny day, a Li-6400 portable photosynthesis instrument was used to determine the photosynthesis ratio (Pn, μmol CO 2 ·m −2 ·s −1 ) of the second true leaf. Also measured were intercellular CO 2 concentration (Ci, μmol·mol −1 ), stomatal conductance (Cond, mmol·m −2 ·s −1 ), and transpiration rate (Tr, g·m −2 ·h −1 ). The intensity of the built-in light source was set to 1200 μmol·m −2 ·s −1 , each leaf was measured three times, and the average value was used in the analysis. For instrument principle, sampling technique, and detailed settings, refer to "Using the LI-6400 Portable Photosynthesis System."

Determination of chlorophyll content in leaves
The leaf chlorophyll content of the parents, F 1 , and F 2 individual plants was determined by UV/visible spectrophotometer. The main stem functional leaves were collected from each individual plant, and ten pieces were cut out with a 9-mm punch and weighed. About 0.1-0.2 g leaves were then placed in a 10-ml test tube, the fresh weight recorded, 10 ml of 95% ethanol added, and the tube sealed and stored for 48 h in the dark. Tubes were shaken in the middle of the incubation and mixed until the leaves were completely white. After the incubation, the extracted chlorophyll of each sample was placed in a spectrophotometer and the optical density was measured at 665 nm, 649 nm, and 470 nm to respectively determine chlorophyll a (Chl a), chlorophyll b (Chl b), and carotenoid (Car) content. Subsequently, the chlorophyll a/b ratio (Chl a/b) and total chlorophyll (Total Chl) were calculated. Each sample was repeated three times, and the average was taken as the result. Pigment concentrations were calculated according to the following formulas: in which Ca, Cb, and Cx•c represent the concentration in mg/L of chlorophyll a, chlorophyll b and carotenoids, respectively.
The pigment content of the leaves was then calculated as follows: where C represents the pigment concentration (mg/L), V represents the total amount of extract (ml), W represents the fresh weight of the sample (g), and the subscript x represents the pigment: chlorophyll a or b, or carotenoids.

Plant type traits at flowering and boll stage
Plant height (PH2) and fruit branch number (FBN) were investigated at the first flowering and boll stage in the parents, F 1 , and F 2 individual plants.

Fiber quality traits
Middle and upper fibers were collected from the parents, F 1 , and F 2 individual plants and sent for testing at the Cotton Quality Supervision, Inspection and Testing Center of the Ministry of Agriculture (HVI SPEC-TRUM 4.05.01 version, HVICC calibration level). Tested fiber quality properties included: fiber length (FL), fiber strength (FS), micronaire value (MIC), fiber length uniformity (FU), fiber elongation (FE). Due to high temperatures and too much rain in the summer of 2011, which caused abortion of pollen and super-separation of the sea-land hybrid population, some families failed to receive enough mature fiber, resulting in a lack of yield and quality trait data in some lines.

Population DNA preparation, resequencing, and genotyping
Sequencing libraries were constructed with an insert size of 150 bp and sequenced on the Illumina HiSeq 2000 platform (Illumina, San Diego, CA, USA). To construct paired-end libraries, DNA was fragmented by sonication, and DNA ends were blunted before adding an A base to each 3′ end. DNA adaptors with a single T-base 3′ end overhang were ligated to the above products. Ligation products were purified on 2% agarose gels that each targeted a specific range of insert sizes. Quantification and quality assessment were carried out by running 1 μL of the library on an Agilent DNA 1000 LabChip analyzer (Agilent Technology 2100 Bioanalyzer). All raw reads were processed for quality control and filtered using fastp (https:// github. com/ OpenG ene/ fastp) with default parameters. The clean reads were mapped to the TM-1 reference genome [38] using Burrows-Wheeler Aligner (BWA) with the parameters of 'mem -t 20 -M -R' . The mapping results were sorted and duplicates marked using functions implemented in SAMtools and Picard (http:// broad insti tute. github. io/ picard/). Only reads that mapped uniquely to the reference genome sequence were used to call SNPs. Identification of SNPs between the parental lines and F 2 individuals was performed with Genome Analysis Toolkit 4 (GATK4). High-quality SNPs were filtered following the best practices workflow developed by the GATK team. SNPs with minor allele frequency (MAF) < 5% and represented in less than 30% of the F 2 population were excluded using VCFtools. Polymorphic markers between the two parental lines were retained if they had the aa × bb segregation pattern in F 2 individuals.

Bin map construction
Recombinant breakpoints were identified using a slightly modified sliding window approach based on the ratio of SNP alleles derived from TM-1 and Hai7124 [38]. Consecutive 100-Kb intervals having the same genotype in the whole F 2 population were merged as a recombination bin. Bins with significantly distorted segregation (P-value < 0.001) were filtered using the Chi-square test, and those remaining served as genetic markers for the construction of a genetic linkage map using Icimapping [45]. Collinearity between the genetic map and physical positions was visualized using ALL-MAPS (https:// github. com/ tangh aibao/ jcvi/ wiki/ ALLMA PS). A region containing three or more closely linked bins that exhibited significant segregation distortion (P < 0.001) was defined as an SDR.

Statistics of phenotypic traits
For all traits, ANOVA was used to test for significant differences between parents, F 1 , and F 2 individuals, and correlation coefficients and phenotypic variation were also calculated using SPSS v18.0 (SPSS, Chicago, IL, USA). The heterosis (H) of each trait is expressed by two values, mid-parent heterosis and over-parent heterosis: MH = (F 1 -MP)/MP × 100%, where MP is the average value of the parents.

QTL mapping
IciMapping 3.0 (http:// www. isbre eding. net) was used to detect the effects of QTLs in the F 2 population. An LOD threshold of 2.5 was used to define significant additive QTLs; that is, when LOD ≥ 2.5 for a marker interval, it was considered to contain a significant QTL. At the same time, the additive effect (A), dominant effect (D), and contribution rate (R2) of each QTL on corresponding traits were calculated. The QTL genetic action mode uses the absolute value of D/A to judge the action effect of each QTL; a value greater than 1.20 indicates an over dominant effect, 0.81-1.20 a dominant effect, and 0.21-0.80 a partially dominant effect. Less than 0.20 indicates an additive effect. The method of naming QTLs follows that used for rice: QTL + traits + chromosome + QTL number.

Candidate gene identification and expression
The putative candidate genes for the QTLs were predicted as follows. First, we analyzed the SNP types located in QTLs based on our assembled genome sequence for TM-1. We focused on significantly associated nonsynonymous SNPs located in exons or SNPs in the upstream of the candidate genes. Second, based on expression profiling data for sixteen vegetative and reproductive tissues from TM-1 (cotton.zju.edu.cn). We checked whether these selected candidate genes were dominantly and/or specific expressed in a development stage that is critical for the target trait. We further narrowed down the candidate genes according to their expression levels between TM-1 and Hai7124 (cotton.zju.edu.cn).

High-density genetic map construction and characteristics of the bin marker loci
We developed an interspecific F 2 population from a cross between G. hirsutum acc. TM-1 and G. barbadense cv. Hai7124, which contained 249 individuals in total. Whole-genome sequencing of all individuals was performed on an Illumina Hiseq2000. In total, 3.01 Tb clean reads were generated, with an average of 5.3 × depth genome coverage for each individual. For the parents 'TM-1' and 'Hai7124' , we utilized clean data from our previous research totaling 185 Gb and 111.8 Gb respectively [20], with an average depth of over 50 × . All clean reads were mapped to the TM-1 as the reference genome. After filtering SNPs by established criteria, a total of 4,257,943 high-quality SNPs ( Fig. 1) were retained and used to generate bin markers (a group of consecutive SNPs in the same block for genotyping) with a modified sliding window approach [33]. After filtering 1428 bins that exhibited significant segregation distortion (P < 0.001), a total of 6303 bin markers were generated, with an average length of 363.1 Kb (Table 1, Fig. 1). Finally, the high-density genetic map was constructed, covering 5057 cM with an average inter-bin genetic distance of 0.8 cM (Fig. 1, Table 1). The 26 linkage groups of the map was corresponding to 26 cotton chromosomes. Each of the linkage group contained 242.4 bins on average, ranging from 157 (D04) to 405 (A11), overall comprising 3,455 in the A subgenome and 2,848 in the D subgenome. The total length of the A subgenome was 2663.24 cM, and that for the D subgenome was 2393.89 cM. The longest linkage group was A11 of 284.58 cM, and the shortest one was High-density genetic map construction of the (TM-1×Hai7124)F2 population. A Bin maps for the 241 scored F2 individual lines. Colored tracks represent the 241 individual lines of the THF2 population that were used for linkage map construction: red, alleles inherited from maternal parent (TM-1); green, alleles inherited from paternal parent (Hai7124); blue, alleles inherited from heterozygous genotype (TM-1 × Hai7124)F1. The horizontal scale indicates physical distance. B Distribution of markers across 26 chromosomes; ordinate is genetic distance, cM. C Genetic map quality as indicated by recombination fractions of all markers A08 of 126.68 cM. The largest average distance between markers was 1.1 cM in the D07 linkage group, while the smallest average distance between markers in A10 was 0.64 cM (Table 1).
A total of fourteen gaps that larger than 10 cM were distributed across the all 26 chromosomes, seven at the A subgenome and seven at the D subgenome. The average ratio of bin marker interval (< 5 cM) for all linkage groups was more than 99%. A region containing three or more closely linked bins that exhibited significant segregation distortion (P < 0.001) was considered a segregation distortion region (SDR). There were 88 and 32 SDRs in the A and D subgenome, respectively ( Table 1). The quality of the genetic map was further examined by comparing genetic and physical distances, which showed good collinearity ( Supplementary Fig. 1).
Chi-square tests of the 6303 co-dominance bins identified 724 that do not conform to the 1:2:1 genetic law ratio of Mendelian theory. Among these 724 partial segregation bins, 86 were biased toward the parent TM-1, 638 toward the parent Hai7124, and none toward the heterozygote. In addition, significantly more of the partial segregation bins were located on the A subgenome (450) than on the D subgenome (274), and these bins comprised a higher proportion of the A subgenome (13.02%) than of the D subgenome (9.62%). Moreover, the partial segregation bins were unevenly distributed across the 26 chromosomes; the ratio of partial segregation bins to total bins in a given chromosome was more than 30% on chromosomes A05, A11, and D07 and more than 20% on A08, D09, and D10, but less than 1% on A01, A03, and D01. At the same time, some partial segregation bins exhibited an aggregation phenomenon; namely, bins distributed on four chromosomes (A05, A11, D07, and D08) account for 45% of all partial segregation bins (Supplementary Table 1). To provide a comprehensive overview of recombination in cotton, the recombination rate along each chromosome was estimated by comparing genetic and physical distances. Across the entire genome, the average recombination rate was 2.2 cM/Mb. High rates of recombination were observed in the telomere regions of all nine chromosomes, whereas recombination was suppressed in centromere regions (Fig. 2). Chromosomal regions with recombination rates greater than 1.0 cM/Mb [37] were defined as recombination hot regions (RHRs). A total of 3380 RHRs were identified, and were unevenly distributed on the 26 chromosomes (Table 1, Fig. 2).

Analysis of 35 traits in parents and F 1 and F 2 generations
We surveyed 35 traits in the parents and F 1 and F 2 generations, including six plant type traits, ten leaf morphology and physiological traits at the seedling stage, five leaf chlorophyll content traits, two plant type traits at flower and boll stage, seven yield traits, and five fiber quality traits (Supplementary Table 2).
TM-1, Hai7124, and their F 1 progeny differed to varying degrees in plant type, leaf morphology, physiology, yield, and fiber quality. Concerning plant type traits, TM-1 and F 1 had extremely significant differences; TM-1 and Hai7124 likewise had extremely significant differences, except in CNH; but Hai7124 and F 1 had no extremely significant differences in traits except for D1. Regarding leaf morphology and physiological traits, TM-1 exhibited extremely significant difference from Hai7124 and from F 1 only in SLA and SPn; other traits were not significantly different among the three. In terms of chlorophyll content, TM-1 and F 1 exhibited extremely significant differences; TM-1 and Hai7124 likewise had extremely significant differences in traits other than Chla; but Hai7124 and F 1 did not differ significantly except in Chla/b. With regard to the 12 yield and fiber traits, G. hirsutum and G. barbadense are  by extremely significant differences; most of these characteristic differences were observed in comparisons of TM-1 and Hai7124 and of TM-1 and F 1 individuals. When comparing Hai7124 and F 1 , only the five traits D1, SCY, LY, SI, and LI differed significantly, indicating that the F 1 progeny of G. barbadense and G. hirsutum are more biased towards the G. barbadense phenotype. Taken together, these genetic differences provide a good basis for the screening of important trait QTLs (Supplementary Table 3).
In the F 2 population, the average value and variance of each trait exhibited large changes relative to their parents, and the coefficients of variation differed between traits. Overall, physiology and yield traits featured the largest coefficients; the values for each yield component ranked as follows: BN > SCY > LY > LI > BW > SI > LP. This ranking indicates that in the offspring, different degrees of genetic variation are present for different traits, indicating that these traits are controlled by multiple genes ( Table 2, Supplementary Fig. 2).

QTL mapping of important agronomic traits in cotton
A total of 112 QTLs, 41 in the A subgenome and 71 in the D subgenome, distributing across almost all 26 chromosomes except A03, A08, and D08, were assessed for association with 35 traits using ICIM analysis. The position, LOD score, additive effects, dominance effect, and percentage of phenotypic variance explained (PVE) of the QTLs are given in Table 3. Among them, 16 QTLs were located overlapped with the QTL regions in the previous studies (Table 4). PVE values ranged from 2.95 to 24.89%. The regions occupied by identified QTLs ranged in size from 0.20 to 8.45 Mb, with an average length of 0.78 Mb. With respect to traits, the number of QTLs per trait ranged from 0 to 10 with the most QTLs (up to 10) being detected for STr.
Twelve QTLs were associated with plant type traits at seedling stage, most of which (75%) had positive effects and originated from TM-1, suggesting that G. hirsutum has a growth advantage in the seedling stage. Among these QTLs, the PVE varied from 4.46 to 8.35%; the QTL qCNH-A12 with the highest PVE (8.35%) had positive effects and came from Hai7124. Thirty-seven QTLs were detected for leaf morphology and physiological traits at seedling stage, featuring positive effects and coming from both TM-1 and Hai7127 (19 and 18 QTLs, respectively). We found that all nine QTLs associated with intracellular CO 2 concentration had positive effects and originated with TM-1, and 7/9 demonstrated positive effects, which is the main component of heterosis. A total of 17 QTLs were identified for leaf chlorophyll, with PVE values ranging from 4.43 to 8.1%; both the additive and dominant effects of these QTLs were close to 0.
Twenty-six QTLs were identified for yield or yieldrelated traits. Most QTLs associated with qPH2 and qFBN, and all those with qSCY, qLY, and qSI, exhibited positive effects and came from Hai7124. Meanwhile, QTLs having positive effects associated with qBW, qLP, and qLI came from TM-1, suggesting that G. barbadense has a larger biomass but G. hirsutum has higher fiber yield. Of QTLs associated with fiber quality traits, 80% of those having positive effects came from Hai7124; only four QTLs (qFL-A06, qMIC-D01, qFE-D05-1, and qFE-D05-2) with positive effects originated from TM-1. This result indicated that the genetics governing excellent fiber quality come from G. barbadense. All QTLs and the corresponding location information, LOD, PVE, additive effect, and dominant effect values were presented in Table 3 and Fig. 3.

Candidate gene identification and expression analysis
We identified ten genes that has nonsynonymous SNPs in exons or SNP in their upstream regions was located within the 15 loci of interest for 14 traits (D1, CHN, FLTH, PH1, Sci, TCi, Tcond, BW, SCY, LP, Li, SPn, TPn, TTr). We analyzed their expression in sixteen vegetative and reproductive tissues of TM-1 and compared values with those in Hai7124 ( Supplementary Fig. 2, Supplementary Table 4, 5). Some SNP variants corresponding to QTLs associated with different traits were mapped to the same position or related to the same gene, such as GH_A04G0054/GB_ A04G0055, GH_D04G1426/GB_D04G1512, and GH_ D10G0500/GB_D06G1730 (Supplementary Table 4 , 5).
A representative QTL that related to multiple traits BN, SCY, LP, and LI was located on chromosome D10 (Fig. 4A). This locus encompassed fourteen genes harboring nonsynonymous SNPs. Considering the expression of these genes during fiber development, one was identified as a putative causal gene: root hair defective 3 GTP-binding protein (GhRHD3, GH_D10G0500), which was dominantly expressed during secondary cell-wall bio-synthesis (20 DPA) (Fig. 4B-C). Interestingly, its Hai7124 homolog showed high expression during fiber initiation (0, 1, and 3 DPA) (Fig. 4D). Three nonsynonymous SNPs in GH_ D10G0500, D10Gh: 4,228,677/4228733/4229273 (TTG versus GCA 33:58), demonstrated significant associations with BN, SCY, LP, and LI (Fig. 4E-H). The orthologous gene in Arabidopsis thaliana was identified as involved in the regulation of cell expansion [58][59][60] Supplementary Fig. 3 illustrates a QTL, located on chromosome D04, which related to SCi, TCi, and Tcond. There were thirteen genes harboring nonsynonymous SNPs in this region. Combining their expression level during fiber development, one was identified as a putative causal gene: glycerol-3-phosphate acyltransferase 6 (GhGPAT6, GH_D04G1426), which was highly expressed in leaves. Within this gene, the nonsynonymous SNP D04Gh:47,064,565 (CC versus AA 65:50) was significant associated with SCi, TCi, and Tcond. As reported, its orthologous gene in tomato was involved in regulating the outer wall diameter of leaf epidermal cells [61].

Bin markers are effective for constructing high-density genetic maps and QTL fine mapping in G. hirsutum and G. barbadense
In recent years, scientists have used specific-locus amplified fragment sequencing (SLAF-seq), genotyping by sequencing (GBS), and other sequencing methods to genotype the complex genome of cotton, and the resulting genetic map is based on SNP phasing. This method can identify markers with high throughput; in addition, the chromosome coverage is more uniform and the marker density greatly improved compared with traditional PCR-based markers. With the help of newly developed bioinformatics software, it is possible to complete genotyping and construct genetic maps in a very short time. Compared to the GBS-based enzyme digestion method, a binmap based    on resequencing offers the following improvements: scanning of and mutation identification at all sites in the whole genome, without any prior marker information, yielding complete allelic variant information with higher accuracy than previous experimental methods.
In this study, we obtained a total of 6303 high-confidence bin markers that not only extends the length of the cotton genetic map but also improve its resolution. In our previous studies, we constructed an SSR-based genetic map using a (TM-  Table 3; Table 2).

Non-uniform distribution of QTLs in the A and D subgenomes
In this study, a total of 112 QTLs were detected, of which 71 were in the D subgenome, much more than the 41 in the A subgenome (Table S6). Of QTLs associated with the six plant type traits at seedling stage, more were sited in the A subgenome than in the D subgenome; in contrast, QTLs associated with the other ten leaf morphology and physiological traits at seedling stage, five traits reflecting leaf chlorophyll content, two plant type traits at flower and boll stage, seven yield traits, and five fiber quality traits were all less commonly located in the A subgenome than in the D subgenome. In particular, the D subgenome showed a strong advantage with regard to leaf chlorophyll content, yield traits, and fiber quality traits. This is consistent with previous reports that the D subgenome contributes more to the genetic control of fiber [62][63][64], and suggests that molecular marker selection in the D subgenome may be more efficient for breeding to improve yield and fiber quality.

QTLs and candidate genes may contribute to the improvement of cotton through breeding
Studies involving cotton QTL mapping and candidate gene identification generally focus on traits related to yield and fiber quality; considerably less research has been conducted concerning seedling traits, leaf physiology, and chlorophyll content. Nonetheless, these traits are also important for cotton growth: plant height and leaf area at the seedling stage determine growth vigor, which in turn affects adversity resistance; meanwhile, leaf physiological and chlorophyll content can enhance photosynthesis efficiency and solar energy utilization, eventually helping adaptation to dense planting and increasing production. Here, the candidate gene GH_ D04G1426 demonstrated significant associations with SCi, TCi, and Tcond. Its orthologous gene in tomato has been reported to affect the outer wall diameter of leaf epidermal cells; such functionality may indirectly affect photosynthesis in cotton [61]. In looking beyond direct effects on yield and fiber quality, other QTL and candidate genes in our data may provide additional solutions for cotton molecular breeding.

Conclusions
In conclusion, we constructed a high-density genetic map based on the resequencing data of 249 individuals from an interspecific F 2 population (TM-1 and Hai7124). This genetic map consists of 6303 high-confidence bin markers spanning 5057.13 cM across 26 chromosomes. Based on this map, 112 QTLs relating to agronomic and physiological traits from seedling to boll opening stage were identified. Through the analysis of sequence and expression of the candidate genes within the QTLs mapping regions, ten causal putative genes might responsible for the target traits. Of them, GhRHD3 (GH_D10G0500) was associated with fiber yield and GhGPAT6 (GH_ D04G1426) might play important role in photosynthesis efficiency.